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Abstract Alleles at 12 Short Tandem Repeat loci have 
been sequenced to investigate candidate loci for a multi- 
plex Short Tandem Repeat system for forensic identifica- 
tion, and for single-locus amplification of Short Tandem 
Repeat loci. Variation from the consensus sequence was 
found at 6 loci, while one locus, D21S1 1, was found to be 
complex in sequence. The presence of non-consensus al- 
leles does not rule out loci for inclusion as forensic iden- 
tification markers, but size differences between alleles of 
1 base pair require very precise sizing. We suggest criteria 
for the suitability of Short Tandem Repeat loci as forensic 
identification markers, and propose a universal allele 
nomenclature for simple and compound Short Tandem 
Repeats. The effect of the repeat unit sequence of the evo- 
lution of Short Tandem Repeats is discussed. 

Key words Short Tandem Repeats • Microsatellites 
DNA sequencing • Polymerase Chain Reaction • Forensic 
DNA typing 

Zusammenfassung Allele an 12 Short-Tandem -Repeat 
Loci wurden sequenziert, um Kandidaten fiir ein Multiplex 
Short Tandem Repeat System fiir forensische Identifika- 
tionen und fur Single-Locus Amplifikationen von Short- 
Tandem-Repeat Loci zu untersuchen. Abweichungen von 
der Konsensus-Sequenz wurden an 6 Loci gefunden, 
wahrend ein Locus, D21S1 1, als Komplex in der Sequenz 
gefunden wurde. Die Anwesenheit von Non-Konsensus- 
Allelen schlieBt solche Loci nicht aus fiir die Einbezie- 
hung als forensische Identifikationsmarker. Aber GroBen- 
differenzen von einem Basenpaar zwischen Allelen erfor- 
dem eine sehr genaue GroBenbestimmung. Wir empfeh- 
len Kriterien fur die Eignung von Short-Tandem-Repeat 
Loci als forensische Identifikationsmarker und schlagen 
eine universale Allelnomenklatur fiir einfache und kom- 
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plexe Short-Tandem-Repeats vor. Die Auswirkung der 
Sequenz der Repeateinheit auf die Entwicklung von 
Short-Tandem-Repeats wird diskutiert. 

Schlusselworter Short-Tandem-Repeats 
Mikrosatelliten • DNA-Sequenzierung • Polymerase 
Kettenreaktion • Forensische DNA-Typisierung 



Introduction 

Analysis of Short Tandem Repeat (STR) sequences by the 
polymerase chain reaction (PGR) is becoming the method 
of choice for the forensic identification of body fluids 
(Kimpton et al. 1993, 1994; Fregeau and Foumey 1993; 
Wiegand et al. 1993). Because of problems caused by *shad- 
ow bands' when analysing dinucleotide repeats (Hauge 
and Litt 1993), the less common tri-, tetra- and pentanu- 
cleotide repeats are preferred. 

STR sequences vary in the length of repeat unit, the 
number of repeats and the rigour with which they conform 
to an incremental repeat pattern. 'Simple' repeats contain 
units of identical length and sequence, ^compound' re- 
peats comprise 2 or more adjacent simple repeats, *com- 
plex' repeats may contain several repeat blocks of vari- 
able unit length, along with more or less variable inter- 
vening sequences. 

We have recently studied sequence variation at 2 com- 
plex STR loci, HUMACTBP2 (SE33) and D11S554 
(Urquhart et al. 1993; Adams et at, 1993). Both these loci 
were originally reported to have allele sizes which dif- 
fered by 4 base pair increments (Wame et al. 1991 ; Phrom- 
chotikul et al. 1992). However, our sequence data showed 
that at both loci allele size differences of 1, 2 or 3 base 
pairs also exist. 

Since allele designation of STR PGR products depends 
on accurate sizing, we investigated a range of simple, com- 
pound and complex STR loci which were being screened 
in this laboratory for use as forensic identification markers. 
The markers used in our quadruplex STR system (Kimpton 
et al. 1993, 1994), a Pstl digest of bacteriophage lambda 



14 

Table 1 PGR primers used 
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Locus 



Primers 



Reference 



HUMVWFA31 

HUMTHOl 

HUMFI3A01 

HUMFES/FPS 

HUMCD4 

HUMPLA2AI 

HUMFOLP23 

HUMCYAR04 

HUMTFIIDA 

HUMFABP 

HUMGABRBI5 

HUMD21Sn 



5' CCCTAGTGGATGATAAGAATAATCAGTATG 
y GGACAGATGATAAATACATAGGATGGATGG 

5' GTGATTCCCATTGGCCTGTTCCTC 
3' GTGGGCTGAAAAGCTCCCGATTAT 

5' GAGGTTGCACTCCAGCCTTT 
3' ATGCCATGCAGATTAGAAA 

5' GGGATTTCCCTATGGATTGG 
3' GCGAAAGAATGAGACTACAT 

5' TTGG AGTCGCAAGCTG AACTAGC . ^ . . r^o 

3- TCATGCGTCCATGGTCCGGAGCCTGAGTGACAGAGTGAGAACC 

V CCCACTAGGTTGTAAGCTCCATGA 
3' TACTATGTGCCAGGCTCTGTCCTA 
5' ATTGTAAGACTTTTGGAGCCATTT 
3' TTCAGGGAGAATGAGATGGGC 
5' CTCTGGAAAACAACTCGACCCTTC 
3' TGGGTGATAGAGTCAGAGCCTGTC 

5' GCCTATTCAGAACACCAATA 

3' TGGGACGTTGACTGCTGAAC 

5' GTAGTATCAGTTTCATAGGGTCACC 

3' TTACGCGTCTCGGACAGTATrCAGrrCGTTTC 

V CTAGAAAGCTAGCAAGGTGGAT 
3' GCTCATTAAACACTGTGTTCCT 
5' ATATGTGAGTCAATTCCCCAAG 
3' TGTATrAGTCAATGlTCTCCAG 



Kimptonetal. 1992, 
GenBank M25858 

Polymeropoulos et al. 
1991f,GenBank D00269 

Polymeropoulos et al. 
1991cm, GcnBank M21986 

Polymeropoulos et al. 
1991b, GenBank X06292 

Edwards etal. 199K 
GenBank M86525 

Polymeropoulos ct al. 
1990b, GenBank M22970 

Polymeropoulos et al, 
I99ld» GenBank J0()145 

Polymeropoulos et al. 
1991a, GenBank M30798 

Polymeropoulos et al. 
199le, GenBank M34960 

Polymeropoulos ct al. 
1990a, GenBank Ml 8079 

Dean et ai. 1991, GenBank 
M59216 

Sharma and Liu 1992, 
GenBank M84567 



DNA labelled with the fluorescent dye ROX, sized alle es 
precisely but not accurately, and in our hands sizing alle- 
les differing by only 1 bp could not be performed without 
the use of an allelic ladder. Since many laboratones are 
now involved in STR analysis, and before STR data be- 
comes widespread as courtroom evidence, it would be 
convenient if a universal system of allele designation and 
nomenclature were adopted. To this end, we have investi- 
gated 12 prospective human STR identification markers 
to ascertain convenient, easily understood and scientifi- 
cally accurate methods of allele nomenclature. 



Materials and methods 

The 12 loci studied and their respective PGR primer sequences are 
shown in Table 1. All primers were derived from the Pubhshed o 
GenBank sequences, but the HUMFABP 3' pnmer wa^s engthened 
to incorporate and M/m1 restriction site, and the HUMCD4 5 
pri3 a 20 bp extension, designed by Jeffreys and co-workers 
Oeffrevs et al 1991), to produce allele sizes compatible with one 

STR systems (Kimpton et al,1993) DNA w 
prepared from whole blood as described previously (Gill et al. 
1990) Allele designations at each locus had been made previously 
(Kimpton et al. 1993). For sequencing, hetcrozygotes with allele 
sizeT differing by at least 12, and ideally 20, base pairs, were se- 
lected, and between 8 and 22 alleles were sequenced at each locus. 
This was intended to be a representative sample, rather than an ex- 
haustive survey, of alleles at each locus. 



PGR amplification was performed using 10 ng of genomic 
DNA in a 50 ul reaction volume. Reactions included I x Pan:-bx- 
Sence buffer (10 mM Tris-HCl pH 8.3 ^0 mM KCl 1.5 mA^ 
MgCl2, 1% Triton X-100; Camb.o ^abs, Cambndge UK), 1-25 U 
Taq polymerase (Perkin Elmer Cetus, Norwalk, CT, USA), 200 
uAf each deoxyribonucleotide triphosphate and 0;5 ^^^-^^f ^ 
2primers for each locus: 35 cycles of PGR I [^"J ^5 C 1 m^^^ 
54° C I min 72° C) were performed for the loci HUMVWFA3 , 
HUMF13A01 and HUMFES/FPS. For the other 9 loci the anneal- 
ing temperature was 60'»C; other conditions were idemical and 35 
cycles were again performed. 

PGR products were electrophoresed in agarose gels, excised 
and purified as described previously (Urquhart 1991). Punfied 
PGR products were sequenced from both ends with a Taq Dye-De- 
oxy Terminator Cycle Sequencing kit (Applied Biosystems, Foster 
City Calif.. USA) using the PGR primers as sequencing primers 
Sequence analysis was performed on a 373A Sequencer (Applied 
Biosystems. Foster City, Calif,, USA) using 373 Data Collection 
373 Analysis and SeqEd software (Applied Biosystems, Foster 
City, Calif.. USA). 



Results and discussion 

The consensus sequences of repeat regions at the 12 loci 
are shown in Figs. 1-3. The repeat unit at each locus was 
defined as the first in-frame repeat unit on the strand hsled 
in the GenBank database. Where necessary, ambiguity codes 
were used, in accordance with the recommendations of 
the Nomenclature Committee of the International Union 
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LOCUS 
Simple 

HUMFES/FPS 

HUMPLA2A1 

HUMMBP 



< ATT) 8, 17 
(ATT)g.i5 



gjipple with pop-consensMs alleles 

HUMTH01 

I73bp allele 

HUMFOLP23 
174bp allele 

HUMCD4 

141bp altele 

HUMF13A01 
laibp allele 



(TCAT)5_ii 
(TCAT)^CAT(TCAT)g 

( AAAC > 4 _ gAAAA . AAAC 
(AAAOjo 

(TTTTC )3 . 13 

( TTTTC ) 3£X1XC( TTTTC ) 8 

( GAAA ) 4 . 17GAGTAAAA 
(GAAA)5 AA 



HUMCYAn04 AT ( CTT ) jTTTTGTCTATGAATGTGCCTTTTTTGAAATCATATTTTTAAAATAT ( TTTA ) 7 . 1 3 

166bp allele AT ( CTT ) iTTTTGTCTATGAATGTGCCTTTTTTGAAATCATATTTTTAAAATAT ( TTTA ) 7 
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Fig.l Simple repeat sequences. The variable repeat regions are 
shown along with their flanking sequence where relevant. Differ- 
ences from the consensus sequence for each locus are underlined 



of Biochemistry (1985). Hence, M signifies A or C, Y sig- 
nifies C or T. K signifies G or T. R signifies A or G and V 
signifies A, C or G. 

Of the 12, 8 loci (HUMFES/FPS, HUMPLA2A1, 
HUMFABP, HUMTHOl, HUMF13A01, HUMCYAR04, 
HUMFOLP23 [formerly HUMDHFRP2] and HUMCD4) 
were classified as simple repeats (Fig.l), although 
HUMTHOl, HUMF13A01 and HUMCYARCH each had 
one common allele which differed from the consensus, 
while in HUMFOLP23 and HUMCD4 there was variation 
in individual repeats units. For HUMF13A01 and HUM- 
CYAR04 the non-consensus allele was the smallest allele 
found, and each involved a deletion outside the repeat re- 
gion (Fig. 1 ). The deletion in HUMCYAR04 is 1 of 2 CTT 
trinucleotides 51 bp 5 ' to the repeat region, while that in 
HUMF13A01 was a GAGTAA hexanucleotide immedi- 
ately 3' to the repeat. This deletion occurred in all 6 of the 
181 bp alleles sequenced. At the HUMTHOl locus, the 
largest common alleles, although a 178 bp allele is found 
very occasionally (Edwards et al. 1992; our unpublished 



data), are sized at 173-174 bp. We have sequenced 1 1 al- 
leles sized at 173/174 bp, of which 8 were 173 bp and 3 
were 174 bp. All the 173 bp alleles had the same single- 
base deletion of a thymidine residue in the fifth of 10 
TCAT repeats. These observations have recently been re- 
ported elsewhere (Puers et al. 1993). 

The HUMFOLP23 locus has been called a simple re- 
peat in this study, although most alleles contain the oc- 
tamer AAAA.AAAC following the run of AAAC repeats. 
Nine alleles (including one 174 bp allele, the largest allele 
size found at this locus) had 4-8 AAAC repeats followed 
by AAAA.AAAC. One 174 bp allele had 10 AAAC re- 
peats without the following 8 bp. Since the two 174 bp al- 
leles are indistinguishable by band-sizing methods, we 
decided to designate both as alleles with 10 A A AM re- 
peats. It is not known whether smaller alleles consisting 
solely of AAAC repeats exist. Similarly, the HUMCD4 
locus has been called a simple repeat although both the 
GenBank sequence (Edwards et al. Gen Bank M 8652 5) 
and one allele (out of 1 1) that we sequenced had CTTTC 
as the fourth repeat instead of the consensus TTTTC. Al- 
leles were designated as YTTTC repeats. 

Three loci (HUMGABRBI5, HUMTFllDA and HUM- 
VWFA31) were classified as compound repeats (Fig. 2). 



oci 

ted 

des 
of 
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Fig. 2 Compound repeal 
sequences. The variable repeat 
regions are shown. Differences 
from the consensus sequence 
for the HUMVWFA31 locus 
are underlined 



LOCUS 
Compound 



HUMGABRB1S ( GATA ) 5 _ ^ 3 < GATC) j _ 4 ( TATC) ^ _ 2 

HUMTFtlOA (CAG)3 (CAA)3 ( C AG >9 _ ^CAA ( CAG . CAA ) q _ ^ ( C AG ) 9 _ 2 4CAA . GAG 

Compound with non-consensus alleles 



HUMVWFA31 
144bp atlele 



( ATC T ) 2 ( GTCT ) 3 _ 4 ( ATCT ) 9 . 1 3 

( ATCT)2 (GTCT)4 ( ATCT ) gM!* ATCT ) 4 
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- (TCTG). (TCTA)3TMTCTA)3TCA(TCTA)2 (TCTAJg- 



Even ;:c::;:::,.c,o,:::.Tc.M:TMTcx.,,TCMTc...,x^.TCT.,3.,.t4^- 



GenBank 

Odd 

Even 



OTCTATCTATCCAOTCTATCTACNTCCIATHMRO 
0TCTATCTATCCA0TCTATCTAC£TCCTATSjiAO 
OTCTATCTATCCAOTCIATCTACtTCCTAT'LiAO 



Flg.3 Complex repeal sequence at the D2IS11 locus. The vari- 
able repeat region is shown along with 33 bp ot 3" sequence tor the 
GenBarseq-nce (Sham, and Litt 1992) and our cor^^^^^^^^ 
quences for the odd-numbered and even-numbered a leles The in 
variant hexanudeotidc which we found in all alleles and the 2 
Imendments we made to the GenBank sequence are smgly under- 
S The hexanucleotide found in all even-numbered alleles ,s 
doubly underlined. The segments marded x and y are those used 
for allele designation (see text) 



The HUMGABRBI5 locus consists of blocks of GATA, 
CATC and TATC repeats, all of which vai7 in number be- 
tween alleles. The aggregate number of the 3 repeat types 
(i e the number of KATM repeats) was used for allele des- 
ignation. Similarly, the HUMTFIIDA locus contains 7 or 
9 blocks of CAG or CAA repeats, and allele designation 

^^ASrtfrom one allele in one individual, HUMVWFA31 
was a straightforward compound repeat with the sequence 
(ATCr)2(GTCT)„(ATCT)„, and ^8"^ '^'^ 
RTCT However, one non-consensus allele of 144 bp was 
observed in which the 3'ATCT tract contained an AT di- 
nucleotide (Fig. 2). This could have arisen either by dele- 
tion of TC from a 146 bp 16 allele or by d"pl«;f f 
in a 142 bp 15 allele. This allele was the only HUMVWFA31 
allele seen which differed from the 4 bp repeat pattern in 
over 1500 alleles sized at the locus (unpubhshed data). 

The repeat at the D21 Sll locus was classified as a com- 
plex repeat. Although originally reported to have 4 bp in- 
crements between alleles (Sharma and Lit 1992) Jater 
work (Kimpton et al. 1993) revealed that alleles differing 
by 2 bp were common. The consensus sequences ot Hie 
repeat at the D21S11 locus are shown in Fig. 3. aligned 
with the sequence from GenBank (Accession iiumber 
M84567; Sharma and Litt 1992). A total of 16 alleles was 
sequenced, including the largest and smallest, and 2 pairs 
of identically sized alleles, one pair or which had identi- 
cal sequence. The variable sequence consisted largely ot 
TCTA and TCTG repeat blocks, although an invariant TA 
dinucleotide and an invariant TCA trinucleotide were also 
present. Alleles which were given even-numbered allele 
designations (see 'Allele designation and nomenclature 
below) had a TATCTA hexanucleotide after the tinal block 
of TCTA repeats. Alternatively, this can be v'cwed a.s a 
TA insertion before the last TCTA repeat. In all 16 alleles 
sequenced, a TCCATA hexanucleotide was found which 
does not appear in the GenBank sequence (Sharma and 
Litt 1992) If the absence of this hexanucleotide genuinely 
occurs, this is a further mechanism for 2 bp allele differ- 
ences. Sequencing the D21S11 locus also allowed iis to 
make 2 amendments to the GenBank sequence. In all 16 



alleles sequenced, the N at position 288 in the GenBank 
sequence was a C, and the NN at positions 295-6 was a 
single T residue (Fig. 3). 



Allele designation and nomenclature 

Designation and nomenclature of alleles at STR and 
VNTR loci has been fairly haphazard in the past. For pre- 
sentation of forensic evidence in the courts, and for trans- 
fer of data between different laboratories, some standard- 
isation of allele designation is necessary. The most widely 
applicable method would be to call each allele by its 
length in basepairs. This method would be suitable for 
VNTRs, normal STRs and hypervariable STRs such as 
HUMACTBP2 (Urquhart et al. 1993) and D11S554 
(Adams et al. 1993). However, the allele size is dependent 
on the primers used, and requires a precise and accurate 
sizing method. An alternative is to call alleles by the num- 
ber of repeat units they contain. This is easy for simple re- 
peats and some VNTRs, and can be applied to compound 
repeats with the use of ambiguity codes, but is too ctom- 
bersome for complex repeats. Problems also anse when 
inteSiate alleles occu' as with the HUMVWFA31 144 
bp allele in this study and the various anodic and cathodic 
allele variants in some VNTR systems. Both allele desig- 
nation methods discussed above may involve loss of m- 
formativeness. since the repeat pattern at any individua 
allele is not specified. However, this is inevitable with all 
methods which distinguish alleles solely by size, and the 
increase in informativeness gained by sequencing every 
allele is more than offset by the increased cost. In Ime 
with the recommendations of the DNA Commission of 
the International Society for Forensic Haemogenetics 
(1992) we have called alleles at all simple and compound 
repeat 'loci by their repeat number, using redundancy 
codes for compound repeats. For intermediate alleles and 
other alleles that fail to align with the incremental ladder 
at each locus, digits after a decimal point were used to in- 
dicate the number of basepairs by which the allele ex- 
ceeded the previous 'rung' of the ladder. Thus, the 144 bp 
allele at the HUMVWFA31 locus was designated 15.2, 
the 166 bp allele at the HUMCYAR04 locus was desig- 
nated 6 1 the 173 bp allele at the HUMTHOl locus was 
designated 9.3. and the 181 bp allele at the HUMF13A01 
locus was designated 3.2. It should be noted that the use 
of the number after the decimal point does not necessarily 
imply the presence of a partial repeat (cf HUMF13A01, 
HUMCYAR04), but many indicate variation outside the 
repeat region. 
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Table 2 Characteristics of (TCTA)5-coniaining h 


uman luci in GenBank 








Locus 


Accession 
number 


Reference 


Sequence motif 








Alternative repeat 
(longest block) 


Shortened repeat'' 
(number present) 








TCTG 


TCCA 


TCA TA 


D2isn 


M84567 




5-6 


1 


\ ij J 


HUMVWFA3i(A)^» 


M25858 


Thi^ (itiirlv 


.3-4 


2 




HUMVWFA31(B)'» 




N/lnn<"ii^n et n1 1 


2 





I — 


HUM VWFA3 1(C)** 




Mancuso cl al 1989 


1 


_ 


2 _ 


HUMGABRBI5 


M59216 




_ 


_ 




T03428 


T03428 


(Chan et al 19Q2 








D18S53 


Z16461 


Wri*iv:mh:irh pf nl 1Q09 

TV k^lSSLrllLluVlf Vl tilt J 77^ 




1 




D2SI2I 


Z I 6545 


Weissenbach et al. 1992 








D7S513 


Z 16989 


Weissenbach et al. 1992 


1 






DI()S225 


ZI7I56 


Wci»;<icnhach et al 1992 


1 


2 




HS7SKP4I 


X04237 


Miirnhv et al 1984 

ivjuj|Jiijr Cl til. 1 70*+ 


1 






HSCSFIPO 


X 14720 


Hnmnp pt nl IQ5tQ 
naiiipc CL al. ivo? 








HSIGK6 


700004 


j<.it,niv„ncn ci ai. J70'+ 








HUMBKM 


M35828 


Frirk«:rtn et i1 tQ}{K 

LJllLlV^UIl Cl ul. 1 70n 


1 




— i 


HUMCD4rA^*' 




rLUWdrus Cl ui. uciiOdriK. 






t 

1 — 


HUMCD4(B)*' 




CUWalUS Cl ui. WCllDaUK 


1 




1 

— J 


HUMCD4fD^'' 




CUWuiUrt Cl ui. VJCIIOulIK 


2 


2 


1 


HUMHPRTB 


M 26434 


rMlSOrgC Cl al. I7W/ 


1 






HUMRPTPOLF 


102066 


w cuci aiiu ivlay 1707 






1 

1 — 


HUMSIRPOBD 


M87698 


nuusuii Cl ui. 177^ 








HUMSIRPOCP 


M 87736 


nuusuii Cl di. 177^ 


1 




1 — 


HUMPGKI 


S75476 


T^IIpv pt 1Q01 
Ixlicy cl al. i77l 


1 






HUMPAH 


LI 0305 


VJUIISIJV Cl UI. 1 77.1 


1 




J — 


DXS98I 


M38419 


Mahtani and Willard 1993 


1 




0/1 


DI2S66 




Roeweretal. 1992 


1 






D12S67 




Roeweretal, 1992 


6 


7c 




DYS19 




Roewerei al. 1992 


1 


7C 


1 



TCA trinucleotides and TA dinucleotides were only included if ^ HUMVWFA31 A region: bases 1683-1770; B region: bases 
they had at least one TCTA, TCTG or TCCA tetranuclcotide re- 1 889-2063; C region: bases 2084-2343. HUMCD4 A region: 
peat imnnediately adjacent on each side bases 5624-5686: B region: bases 5944-6043; D region: bases 

7101-7340 
Full sequence not given in reference 



Allele designation for the complex repeat at D21S1 1 is 
more problematical. Each allele contains a mixture of di-, 
tri-, tetra- and hexanucleotides (Fig. 3). Three options 
were considered: naming alleles by length in base pairs, 
using an arbitrary system of allele designation, and nam- 
ing alleles by the number of TV dinucleotide repeats. We 
decided on the third option largely because it was consis- 
tent with nomenclature at the other loci in this system. 
However, there were 2 minor inconsistencies. The system 
of nomenclature excludes the invariant TCA trinucleotide, 
and treats the CA in the centre of the TCCATA hexanu- 
cleotide as a TV dinucleotide. The allele designation at 
the D21S1 1 locus thus indicates die aggregate number of 
TV dinucleotides (plus one CA dinucleotide) in the two 
regions labelled x and y in Fig. 3. 



TCTA and related repeats 

We noted several similarities between the sequences of 
D21S11 and HUMVWFA31. When written as TCTA re- 
peats, both contain compound (TCTG)p(TCTA)^^ regions 
and the sequence motif (TCTA),TA(TCTA)s appears (ei- 
ther once or twice) in D2IS11 alleles and in the 144 bp 
non-consensus HUMVWFA3I allele. A search of Gen- 
Bank human sequences for (TCTA)5 and its complement 
(TAGA)5 produced 22 matches, including D21S11, 
HUMGABRB15 and 3 regions of HUMVWFA31 (Table 
2). These sequences, plus 5 others recently published, 
DXS981 (Mahtani and Willard 1993), phenylalanine hy- 
droxylase (Goltsov et al. 1993), D12S66, D12S67 and 
DYS 1 9 (Roewer et al. 1 992), were examined for sequence 
motifs common to several sequences. These are sum- 
marised in Table 2. Many of the TCTA repeat blocks had 
single repeat units that differed by one base from the con- 
sensus repeat, the commonest being TCTG and TCCA 
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(occurring in 67% and 21 % of sequences respectively). At 
some loci, notably D21S11. HUMVWFA31 and D12S67, 
longer blocks of TCTG were present. Truncated repeats 
were also present at some loci. The commonest trmu- 
cleotide, TCA, occurred at 37% of the loci, while the di- 
nucleotide TA was present in 22% of loci. The frequency 
of these truncated repeats may well be an underestimate, 
since many of the reported TCTA repeats were discovered 
by hybridisation to (TAGA)^ or similar oligonucleotides, 
the binding of which would be decreased by imperfect re- 

^^learches were also performed for (TCTOg and 
(TCCA)^, but both failed to find sequences associated 
with extensive TCTA repeats. However, the TCCA repeat 
in the HUMIGCAAA locus (Yu et al. 1990) included 2 
separate TCA trinucleotides. Interestingly, the non-con- 
sensus 9.3 allele at the HUMTHOl kxrus (see above) can 
be regarded as a TCA trinucleotide in the middle of a 

(TTCA)„ tract. 

The sequence at the HUMGABRBIS locus (Fig. 2) can 
be written as (TAGA)5.,2(TCGA),.3(TCTA),.2, i.e. 2 mu- 
tually palindromic TCTA tracts surrounding a TCOA 
tract It is possible that the central tract is formed by lim- 
ited gene conversion, the 2 TCTA tracts in opposite onen- 
tation acting on each other. 

These observations suggest a scenario for evolution ot 
TCTA repeats. Since the TCA trinucleotide could not be 
generated by duplication from (TCTA)^, we would sug- 
gest that this unit originates by deletion of a thymidine 
residue from TCTA. By analogy, TA dinucleotides may 
arise by deletion of C from TCA or, alternatively, by dele- 
tion of TC or CT from TCTA. Of course, they could also 
arise by TA duplication. In a simple TCTA repeat (e.g. 
D2S121) individual TCTA units may mutate to TCTG (or 
TCCA), giving imperfect but essentially simple repeals 
(e g. D7S513). These imperfections may then expand, ei- 
ther by genuine size expansion or by gene conversion 
(Jackson and Fink 1981; Slightom et al. 1988), producing 
a compound repeat such as HUMVWFA31. TCTA (and 
TCCA) units may also undergo deletion to TCA or TA, 
producing complex repeats such as D21S11. Variation at 
the D21S1 1 locus suggests that repeat expansion can con- 
tinue after event such as TCA generation. These events 
would lead to degeneration of simple TCTA repeats with 
time into complex repeats, containing TCTA, TCTG and 
TCCA blocks interspersed with dinucleotide and tnnu- 
cleotide truncated repeats. Indeed, some degnerate TCTA 
repeats such as those at the HUMVWFA31 and HUM- 
CD4 loci (Mancuso et al. 1989; Edwards et al. Gen Bank 
M86525) can reach thousands of bp in length. 



Other repeats 

As discussed above, the most common mutation in TCTTA 
repeats is to TCTG, i.e. and A > G transition. The HUMT- 
FIIDA and HUMCD4 loci appear to have developed by 
similar events, respectively by CAG > CA A and AAAAG 
> GAAAG. Only at the HUMFOLP23 locus is there an 
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apparent transversion (A > C); in this case, the mutational 
event could also be a deletion. The predominance of tran- 
sitions is also seen at AAAG repeats (Urquhart et al. 
1993; Adams et al. 1993) where the usual nonconsensus 
repeat is AAGG. 



Non-consensus alleles 

Of the 12 loci studied, 6 showed non-consensus alleles, 
and 2 of these, HUMFOLP23 and HUMCD4, only dif- 
fered from other alleles in sequence and could not there- 
fore be distinguished by sizing. Nonconsensus alleles 
showed a distinct tendency towards the ends of allele size 
ranges. That at HUMCD4 was the second largest allele, 
while at HUMTHOl the 9.3 allele was I bp smaller than 
the largest common allele. The HUMFOLP23 non-con- 
sensus allele was the same size as the largest alleles found, 
and those at HUMF13A01 and HUMCYAR04 are the 
smallest alleles found at the loci. Only the extremely rare 
HUMVWFA31 15.2 allele falls towards the middle of the 
allele size range. . 
It is possible that mutation to non-consensus allele is a 
mechanism which prevents both extreme expansion to 
high repeat numbers and extreme contraction to low-num- 
ber, non-polymorphic, repeats. For the 2 loci where the 
non-consensus allele is at the low end of the size range, 
HUMF13A01 and HUMCYAR04, deletions outside the 
repeat sequence cause the non-consensus allele, while 
those at the top end of their range are caused by either 
deletion within a repeat (HUMTHOl) or substitution 
within a repeat (HUMFOLP23 and HUMCD4), 

Alleles at the D21S11 locus show a bimodal distribu- 
tion, with odd-numbered and even-numbered alleles 
showing distributions over different size ranges (Fig. 4). 
Presumably this is due to the effect of the TA dinucleotide 
on increase or decrease of repeat number over time. 

Alleles at other polymorphic STR loci require investi- 
gation to determine the extent to which sequence effect 
evolution at these loci. 



Implications for forensic use 

We have surveyed 12 STR loci to investigate candidate 
loci for an STR system for forensic identification. The 
major considerations for selection of loci were discrimi- 
nating power, absence of linkage, agreement with Hardy- 
Weinberg equilibrium, low levels of ^shadow bands' 
(Hauge and Litt 1993), compatibility with other loci (for a 
multiplex STR system) and accurate sizing of alleles. 
Where alleles differ by 2 bp or more, sizing using die Pst 
I digest of bacteriophage lambda as a marker consistently 
distinguished alleles, but alleles differing by 1 bp required 
sizing by an allelic ladder. Hence, the non-consensus alle- 
les at HUMF13A01 and HUMVWFA31 were sized and 
designated accurately. However, at the HUMTHOl locus, 
the 93 and 10 alleles were treated as a single pooled allele 
group, since it was not possible to consistently distinguish 
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Fig. 4 Allele distribution at the 
D21Sn locus. Odd-numbered 
alleles are solid, even-num- 
bered arc hatched, showing 
two overlapping normal distri- 
butions. Data are pooled from 
157 British individuals (i.e. 
314 chromosomes) or various 
racial origin 



2 0.1 




Table 3 Allele size ranges and 



Locus 



Repeat Consensus alleles 



Non-consensu5 



non-consensus alleles at tne iz 
STR loci. Sizes are as deter- 
mined by sequencing the alle- 
les 

HUMFES/tPS ATTF 

HUMPLA2AI ATT 

HUMFABP ATT 

HUMTHOI TCAT 

HUMFI3A01 GAAA 

HUMCYAR04 TTTA 

HUMFOLP23 AAAM 

HUMCD4 YTTTC 

HUMGABRB15 KATM 

HUMTFIIDA CAR 

HUMVWFA31 RTCT 

Excluding TCA trinucleotide D2 1 S 1 1 TV*' 
(see text) — 

between them. This led to a slight loss of informativity at 
this locus. 

The sequence data presented here are of less relevance 
to non-fluorescent STR analysis in which allele designa- 
tion is by comparison with an allelic ladder. However, the 
allele designations suggested here are relevant whichever 
method of analysis is used. Small (1 or 2 bp) differences 
in allele sizes can cause problems using non-fluorescent de- 
tection methods, particularly where there is an appreciable 
difference in motility between denatured DNA strands. 

In the future, an ideal multiplex STR system would 
consist of loci in which alleles differ by a minimum of 2 
bp. The presence of non-consensus alleles does not rule 
out loci for inclusion as forensic identification markers, 
but size differences between alleles of I bp require very 
precise sizing. With the use of allelic ladders, more dis- 
criminating hypervariable loci such as HUMACTBP2 
(Urquhart et al. 1993) and D11S554 (Adams et al. 1993) 
could be used. D21SI I, though complex in sequence, can 
be sized in our system and, as a highly discriminating locus, 
would be a useful component of a multiplex STR system. 

From the loci investigated in this study, we have devel- 
oped a quadruplex STR system including the loci HUM- 



allele 



Smallest 


Largest 




allele 


allele 




8 (211 bp) 


14 (235 bp) 




8 (110 bp) 


17 (137 bp) 




8 (213 bp) 


15 (234 bp) 




5 (154 bp) 


U (178 bp) 


9.3 (173 bp) 


4 (183 bp) 


17 (235 bp) 


3.2 (181 bp) 


7 (169 bp) 


13 (193 bp) 


6.1 (166 bp) 


6 (158 bp) 


10 (174 bp) 


10 (174 bp) 


3 (96 bp) 


13 (146 bp) 


12 (141 bp) 


9 (124 bp) 


15 (148 bp) 




27 (168 bp) 


40 (207 bp) 




12 (130 bp) 


21 (166 bp) 


15.2 (144 bp) 


56 (209 bp) 


76 (249 bp) 





FES/FPS, HUMTHOI, HUMF13A01, and HUMVWFA31 
(Kimpton et al. 1993, 1994), and further, more discrimi- 
nating, systems are under investigation. 
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Sequencing of multiple recombinant clones generated from polymerase chain reaction-amplified products 
demonstrated that the degree of heterogeneity of two well-conserved regions of the hepatitis C virus (HCV) 
genome within individual plasma samples itom a single patient was consistent with a quasispedes strocturc of 
HCV genomic RNA. About half of circulating RNA molecules were identical, while the remaining consisted of 
a spectrum of mutants differing from each other in one to four nucleotides. Mutant sequence diversity ranged 
from silent mutations to appearance of in-frame stop codons and included both conservative and nonconsei^ 
vative amino acid substitutions. From the relative proportion of essentially defective sequences, we estimated 
that most circulating particles should contain defective genomes. These observations might have important 
implications in the physiopathology of HCV infection and underline the need for a population-based approach 
when one is analyzing HCV genomes. 



Hepatitis C virus (HCV), a 10-kb positive-stranded RNA 
virus, has recently been shown to be the major causative 
agent of parenterally transmitted non-A, non-B hepatitis (1, 
5, 11, 21). Despite little overall primary sequence identity, 
the genetic organization of HCV has been shown to be 
similar to those of flaviviruses and pestiviruses (6, 26, 33). 
Significant genetic heterogeneity has been reported among 
isolates from different geographic areas (17) and within 
single isolates from the same individual (17, 20, 33-35). 
Comparative sequence analysis of the different HCV isolates 
has shown, however, that the degree of variability is un- 
evenly distributed throughout the HCV genome, with some 
ver*.- well conserved regions (6, 13, 17, 33-35) and some 
highly variable genes (17, 20, 33-35). In addition, the rate of 
fixation of mutations of the HCV genome has been estimated 
to be similar to that of other RNA viruses (approximately 
10~^ to 10""* base substitutions per genome site per year), 
and evidence suggesting that the HCV genome may rapidly 
evolve in vivo, with different rates of evolution for different 
regions of the viral genome, has been provided (27). 

Since the pioneering studies by Batschelet et al. (3) 
providing evidence that RNA virus heterogeneity is a con- 
sequence of high error rates in RNA replication, data have 
accumulated which suggest that most RNA viruses consist 
of a heterogeneous mixture of circulating related genomes 
containing a master (must frcquciiily leptesented) sequence 
and a large spectram of mutants, a genomic distribution 
referred to as quasispecies (10, 32). This quasispecies model 
of mixed RNA virus populations implies a significant adap- 
tation advantage because the simultaneous presence of mul- 
tiple variant genomes (and the high rate of generation of new 
variants) allows for the rapiu selection of the mutant(s) with 
better fitness for any new environmental condition. On the 
other hand, a quasispecies will remain in stable equilibrium 
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with little evolution of its consensus or master sequence as 
long as conditions are unchanged (7, 15, 18, 22). 

Many important biological implications predicted by the 
quasispecies model (8) have been found in several virus 
systems displaying such genomic structure, including vacci- 
nation failure through selection of neutralizing antibody 
escape mutants (19), establishment of persistent infection by 
selection of neutralizing antibody or cytotoxic T-lymphocyte 
escape mutants (23, 28, 30) or by generation of defective 
interfering particles (14), resistance to antiviral agents (29), 
and changes in cell tropism or virulence (9, 12). 

In an attempt to define the degree of heterogeneity among 
circulating HCV RNA molecules within individual isolates, 
we analyzed a series of four sequential HCV isolates firom an 
individual with transfusion-associated HCV and human im- 
munodeficiency virus coinfection. Polymerase chain reac- 
tion {PCR)-amplified products of cDNA corresponding to 
fragments of the 5' untranslated region (5'UTR) and non- 
structural region 3 (NS3) of the HCV genome were cloned 
into a bacterial vector, and 6 to 20 recombinant clones from 
each sample were sequenced. A quasispecies distribution of 
sequences was observed in both genomic regions and in all 
sequential samples. 

HCV RNA extracted from 75 jil of plasma by the add 
guanidinium thiocyanate-phenol-chloroform method (4) was 
reverse transcribed into cDNA and PCR amplified in a sing}e 
tube reaction (RT-PCR; kit N 808-007, Perkin Elmer) for 35 
cycles (5 cycles of 94*C for 2 min, 50*C for 2 min, and 72°C 
for 3 min; 30 cycles of 94°C for 1.5 min, &fC for 2 min, and 
72''C for 3 min), using specific oligonucleotide primers of the 
5'UTR (13) and NS3/NS4 regions (16). As represented in 
Fig. 1, the amplified products were 237 and 584 bp long, and 
their specificity was confirmed by Southern hybridization 
with ^-P probes. 5'UTR and NS3/NS4 products were 
cleaved with Notl and Sac\-Ncol, respectively, yielding 
restriction products of 177 and 240 bp, respectively, which 
were subsequently cloned into pSL1190 vector (Pharmacia- 
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FIG. 1. Organization of the HCV genome. Putative structural 
genome regions: core (C), envelope (El and E2/NS1), nonstructural 
protein coding regions (NSl to NSS). Amplified S'UTR and NS3/ 
NS4 products are depicted below, with their respective ends indi- 
cated by their corresponding nucleotide positions from the HCV-1 
prototype (6, 13). Horizontal solid lines represent restriction frag- 
ments generated from each product for subsequent cloning and 
sequencing. Oligonucleotide primers JHC 52 (5'-AGTCTTGCGGC 
CGCACGCGCCCAAATC), JHC 93 5 -TTCGCGGCCGCACTC 
CACCATGAATCACrCCCC), a6/16A (5 -GCATGTCATGATGT 
AT), and C36/16B (5 -GCAATACGTGTGTCAC) used for cDNA 
synthesis ar.J PCR amplification were synthesized according to the 
HCV-1 prototype sequence (13. 16). 



LKB). Positive recombinant clones were subcloned into 
M13fnpl9. Six to twenty independent recombinant sub- 
clones were sequenced by the dideoxy*mediated chain ter- 
mination method (31). The Maxam-Gilbert chemical degra- 
dation method (24) was used to resolve sequencing 
ambiguities. 

When the sequences of 20 independent NSS clones from 
the sample obtained 2 months postransfiision were com- 
pared (Table 1 and Fig. 2a), 9 sequences (45%) weirc identical 
(master sequence) and 11 (55%) differed from each other and 
from the master sequence in one to four nucleotides. Simi- 
larly, of the 20 S'UTR clones from the same sample, 12 
sequences (60%) were identical and 8 (40%) contained 
single, nonrepetitive base substitutions (Table 1 and Fig. 2b). 
The predicted anino acid sequence encoded by the master 
NS3 sequence was identical to that of the corresponding 
HCV-1 prototype (16). However, of the 25 base substitutions 
observed (Fig. 2a) in the 11 mutant sequences, only 5 (20%), 
occurring in the third codon position, were silent mutations. 
Of the 20 nucleotide changes observed in the first and second 
codon positions, 6 (24%) led to conservative amino acid 
changes, 11 (44%) introduced drastic amino acid changes 
(five He ^ Thr, three Leu Arg, one Asp Val, one Met 
-> Thr, and one Tyr His), and 3 (12%) led to the 
appearance of in-frame stop codons in 2 of the 20 sequences. 
Although the functional implications of single amino acid 
substitutions (whether conservative or nonconservative) are 
unknown, it is obvious that mutants containing in-frame stop 
codons are essentially defective. It must be borne in mind 



TABLE 1. Sequence complexity of HCV quasispecies 



Sample 


Postrans- 
fusion mo 


Genomic 
region 
cDNA 


No. of 
clones 
sequenced 


No. of identical 
nucleic acid 
sequences 


No. of 
different 
nucleic acid 
sequences 


VL-1 


2 


NS3 


20 


9 


11 






5'UTR 


20 


12 


8 


VL-2 


3 


5'UTR 


10 


7 


3 


VL.3 


6 


5'UTR 


10 


7 


3 


VL4 


29 


5 UTR 


6 


4 


2 



ihnf rhc sequenced fragment of the NS3 region encodes only 
SO ui u total of 3,011 residues (i.e.. 2.8% of Ihc coding 
capacity of the HCV genome). Assuming that the NS3 region 
is representative of the HCV open reading frame, the finding 
that 10^ of these sequences include stop codons indicates 
that most circulating particle populations must contain de- 
fective genomes. 

To ensure that the observed heterogeneity was not due to 
nucicuiidc iiii!»iiic\iipoidliinis iiUrouuccdby the reverse U^p- 
scriptase or the Tag DNA polymerase during the ampliiica* 
tion reaction, two control experiments were carried out. 
First, one of the NS3 recombinant clones of known sequence 
was in vitro transcribed with SP6 RNA polymerase (Boehr- 
inger Mannheim). The RNA transcript was subjected to the 
original RT-PCR amplification procedure under identical 
conditions, and the amplified product was subcloned. Se- 
quence analysis of five independent clones showed absolute 
identity with the parental clone from which the RNA tran- 
script had liecn obtained. In a second experiment, the cDNA 
insert of one of the sequenced 5'UTR recombinant clones 
was subjected to 35 additional cycles of PCR amplification 
and subcloned. Within 23 such clones (5,451 bases se- 
quenced), only one nucleotide change (A G) was detected 
in one of the clones. Therefore, it seems that the observed 
heterogeneity was not an artifact, although some of the base 
substitutions might have been introduced during the ampli- 
fication procedure. 

With such a low level of misincorporation noise, the 
finding that two distinct and well-conserved regions of the 
viral genome (17) were composed of a mixture of cocircu- 
lating related genomes distributed as a predominant or 
master sequence and a spectrum of mutants (representing 
altogether about half of the genomic population) provides 
unequivocal evidence of the quasispecies structure of the 
HCV genome. Additional sequencing of 6 to 10 independent 
5'UTR recombinant clones, generated from subsequent sam- 
ples obtained later within the acute phase (at 3 months 
poslransfusion) and during the chronic phase (at 6 and 29 
months posiransfusion), showed similar distribution and 
complexity of the quasispecies for that region again with 
randomly distributed single nonrepetitive base substitutions 
in each of the mutants around a conserved master sequence 
(Table 1). This finding demonstrates that the observed qua- 
sispecies distribution was not a phenomenon restricted to 
the acute phase of the infection. 

Demonstration of the quasispecies structure of the HCV 
genome implies the need for a population-based approach 
when one is analyzing HCV genomes. Accordingly, the 
average or consensus sequence of each nucleic acid region 
should always be carefully defined when one is trying to 
determine the complete genomic sequence of a given isolate. 
Otherwise, reconstruction of the HCV genome by randomly 
sequencing overlapping clones of PCR products obtained 
from plasma pools of different subjects (or even from a single 
patient) would generate an artif?»ct<ia! genomic mosaicism. 

Given our limited knowledge of the biological functions of 
most HCV genome regions and of their encoded products, 
besides the potential problems for vaccine development, it is 
not possible at present to anticipate which of the several 
relevant implications predicted by the quasispecies model 
will be applicable to HCV. One may speculate, however, on 
two possible mechanisms which might play a role in the two 
most striking features of HCV infection: its high tendency 
toward viral persistence and the characteristic smoldering 
and fluctuating course of chronic hepatitis C. On the one 
hand, the high degree of heterogeneity of the envelope- 
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FIG. 2. Nucleotide and amino acid sequences of 20 independent NS3/NS4 (a) and 5'UTR (b) clones from sample VL-1 (see Table 1). 
Relative frequencies are shown on the right. Mutant sequences have been aligned to the master (most frequently represented) sequence for 
each region. The one-letter code has been used for protein sequences. Dots indicate sequence identity. Only nucleotides which differ from 
the master sequence are shown. Asterisks indicate in-frame stop codons. Conservative amino acid substitutions are underlined; those in 
brackets correspond to nonconservative changes. 
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encoding regions and its potential foi rapid evolution (as 
anticipated by the quasispecies model) might lead to rapid 
antigenic changes in structural proteins likely to contain 
neutralizing antibody and/or cytotoxic T-cell-spccific 
epitopes and hence provide the virus the opportunity to 
avoid the previous response of the host immune system. On 
the other hand, if our assumption that most circulating viral 
particles contain defective genomes is true, as it has been 
postulated for other vinases (25), this would provide an 
alternative or supplementary mechanism for HCV persis- 
tence. Indeed, there is a wealth of data indicating how 
defective RNA genomes may restrict viral replication in 
vivo, modulate the clinical expression of the disease, and 
lead to the establishment of persistent viral infection (2, 14, 
18). Whether, or the extent to which, these mechanisms play 
a role in the natural history of HCV infection remains to be 
determined. The widespread nature of chronic HCV infec- 
tion and its long-term clinical consequences, however, war- 
rant further studies on any potential mechanism of HCV 
persistence. 
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HIV-1 and HCV co-infected patients: detection of active viral 
expression using a nested polymerase chain reaction 
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The aim of the present study was to determine, in a population of 70 HlV-1 infected patients with 
antibodies to HCV, the percentages of individuals with an active replication of HIV-1, HCV or both. 
During a one year follow-up of these patients at different stages of disease, blood samples were 
regularly collected for determination of transaminases, (32 microglobulin and CD ^ lymphocytes. 
Total RNAs were extracted from the sera, retrotranscribed with MoMuLV reverse transcriptase and 
nested PCR assays were carried out separately with sets of primers homologous to the 5' non- 
translated region of HCV and in HIV-1 gag. The amplified products were subjected to electrophoresis 
and observed under u.v. illumination after staining with ethidium bromide. For some samples, the 
identity of the amplified products was confirmed by Southern blotting by hybridization with enzyme- 
labelled probes. 

A total of 57% of the patients were found to produce HIV-1 RNA and 62% HCV RNA, while 34% 
produced both. HIV-1 RNA production was correlated with P2 microglobulin and CD ^ levels; active 
replication of HCV was correlated with hepatitis but not with CD"^ levels nor with HlV-1 RNA 
synthesis. 

KEYWORDS: HIV-1, HCV, RNA, polymerase chain reaction. 



INTRODUCTION 

Some years ago, HIV-1 infected patients were con- 
sidered to have active viral transcription and viral 
synthesis while entering stage IV of the CDC classifi- 
cation; these data were mainly based on the detec- 
tion of HIV-1 antigenemia (HIV-1 Ag). More recently, 
virus titration in the plasma/"^ detection of viral RNA 
in peripheral blood mononuclear cells (PBMC) and/or 
plasma by polymerase chain reaction (PCR) after a 
reverse transcription (RT) step^"^"* or in situ hybridiza- 
tion (ISH) with RNA probes^^ has shown that asympto- 
matic patients (at stage II of the CDC) are likely to 
transcribe viral and/or messenger HIV-1 RNAs and to 
produce infectious virus particles. These data can 



have therapeutic implications, since it now seems 
that there is no real state of latency after primary 
infection. 

Some of these patients, mainly intravenous drug 
users, can be co-infected with hepatitis C virus (HCV). 
HCV primary infection is frequently followed by a 
chronic state. While the detection of anti-HCV anti- 
bodies is useful for diagnosis, they cannot provide 
reliable data as to the level of viral synthesis."*^''^ This 
synthesis can be determined by a RT-PCR for the 
detection of HCV-RNA.^^"'' 

The aim of the present study was to evaluate, in a 
population of 70 HIV~1 infected patients exhibiting 
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antibodies to HCV, the percentage of individuals with 
active synthesis of HlV-1, HCV and both, as detected 
by RT PGR. 



MATERIALS AND METHODS 

One hundred and nineteen HIV-1 infected patients 
(Western blot confirmed) at different stages of the 
disease, consecutively hospitalized between 1 
October 1991 and 31 December 1991, were screened 
for the presence of antibodies to HCV (Diagnostics 
Pasteur EIA and Chiron RIBA confirmation); 70 of 
them were found to be H1V-1/HCV co-infected (52 IV 
drug abusers; 18 blood transfused). Whole blood 
samples were collected from these patients; CD^ 
lymphocytes were numbered by flow cytometry. 
HIV-1 Ag was determined by Diagnostics Pasteur EIA. 
Serum P2 microglobulin was titrated using Behring 
EIA. Transaminase levels were determined and chro- 
nic hepatitis ' was defined as transaminase levels 
higher than 1-5 times the normal level for more than 
6 months. Aliquots of serum were immediately frozen 
at -SO^C before being used for PCR. 

PGR for detection of HIV-1 and HGV RNAs 
Preparation of RNA 

RNA from 100 ^il of serum was extracted using 0-5 ml 
4 M guanidium isothiocyanate, 25 mM sodium acet- 
ate, pH7, 0-5% sarcosyl, 7% p-mercaptoethanol, 
50 |il 2 M sodium acetate, pH 4, 0-5 ml phenol, 100 \i\ 
chloroform/isoamylalcohol (49:1). The mixture was 
vigorously shaken, then allowed to stay for 15 min on 
ice before being centrifuged at 12,000 g for 30 min at 
4°G. This step of extraction was followed by precipi- 
tation: 500 ^1 of the supernatant was added to 0-5 ml 
isopropanot with 50 |it 3 m sodium acetate, pH 7, and 
5^1 glycogen, then allowed to precipitate at -20°G 
for 2h before centrifugation at 1 2,000 g for 30 min. 
The supernatant was discarded while the pellet was 
washed with 2 ml of 75% ethanol and centrifuged for 
15 min at 12,000 g. The pellet was dried and dissolved 
in 50 \i\ RNase-free H^O, heated at 65''G for 20 min 
before being frozen at — 80*'C. 

Treatmertt with RNase-free DNase 

A total of 10 \i\ of RNA was treated during 1 h at room 
temperature with 30 u of RNase-free DNase; the 
reaction was stopped by heating for 5 min at 94''C. 



Reverse transcription 

Reverse transcription was carried out for 1 h at 42*G 
with 2^1 10 X buffer 500 mM KCl, 100 mM Tris-HGI, 
pH 8-3; 15 mM MgClj, 0-1'% gelatin, 1 ^1 dNTP (10 mM 
each), 1 ^1 virus-specific antisense primer (100 pM) of 
the first PCR (see below), 1 ^il RNasin (20 u), 1 ^1 50 mM 
MgCI^, 3^1 H^O, 1^1 MoMuLV (Moloney Murine 
Leukaemia Virus) reverse transcriptase (100-200 u) 
and 10^1 RNA. The reaction was stopped by heating 
at 95'G for 5 min. HIV-1 and HCV PGR were then 
performed using a nested technique. 

Nested PCR for H/V-t 

Two sets of primers were used to amplify a conserved 
region of HIV-1 gag (Fig. 1). The outer primers 
were 5' ATAATCCACCTATCCCAGTAGCAC 3' (SK38) 
(sense) and 5' TGAGATGCTGTGATGATTTGTTC 3' 
(antisense); the inner primers were SK 38 (sense) and 
5' TTTGGTGGTTGTCnATGTGGAGAATCC 3' (SK 39) 
(antisense). 

To 50 |il of the reaction mixture was added the 
cDNA, 30 pM of the primers, 10 mM of each dNTP, 
10 mM Tris-HGI, pH 8-3, 1-5 mM MgCI^, 50 mM KCl and 
2-5 u of Tag polymerase (Boerhinger). The mixture was 
then overlaid with 50 ^il of mineral oil and the tubes 
were placed in a DNA thermal cycler (Perkin-Elmer 
Cetus). 

The parameters of both rounds of amplification 
were identical: 10 min at 94°G followed by 35 cycles 
of 1 min at 94T, 1 min at 51*C and 1 min at 72''C 
Ten microlitres of the amplified products of both 
rounds corresponding to 276 and 115 bp were sub- 
jected to electrophoresis in a 3% agarose gel 
(Nusieve, Sigma) and stained with ethidium bromide. 
The presence of a 115 bp band in the second round 
was considered a positive result. 

Nested PCR for HCV 

All used primers had been selected to amplify the 
highly conserved 5' non-translated region of the 
HGV genome."'^* The outer primers were: 5' 
TGGGGGGGACACTGCAGCATAGAT 3' (sense) and 
5' CGTGGTGGTGGAGGGTGTACGACAGGT ^3' (anti- 
sense) and the parameters of the first round were as 
follows: 10 min at 94'G then 35 cycles of 1 min at 
94"G, 2 min of annealing at 50'G and 3 min at 72'C. 

The inner primers were: 5' GCAGCATAGAT- 
CACTCCCCTCT 3' (sense) and 5' CACTGGGAAG- 
CACGGTATCAGGGAGT 3' (antisense). The para- 
meters were 10 min at 94'C followed by 35 cycles of 
1 min at 94'G, 2 min at 46**C and 3 min at 72"*C 
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5' LTR 



(a) HIV-l 

U3 RU5 gag 




3' LTR 



= P3 
Probe 



P1:5'ATAATCCACCTATCCCAGTAGGAG3' 
P2:5'TGACATGCTGTCATCATTTCTTC3' 
P3:5"nTGGTCCTTGTCTTATGTCCAGAATGC3' 
Probe: 5'ATCCTGGGATTAAATAAAATAGTAAGAATGTATAGCCCTAC 3^ 



(b)HCV 

ORF 
1 2 3 C 



El/S 



E2/NS1 



NS2 



NS3 



NS4 



NS5 



NCI IdL- 



NG2 c 



— ANCl 
ANC2 



Probe 



NC1:5'TGGGGGCGACACTCCACCATAGAT 3' 
ANC:1:5'CGTGCTCATCGTGCACGGTCTACGAGA3' 

NC2:5'CCACCATAGATCACTCCCT 3' 
ANC2:5'CACTCGCAAGCACCCTATCAGGCAGT3' 
Probe-5 TACACCGGAATTGCCAGGACGACCGGGTCCrrrCrrGGAT 3- 



Fig. 1. Positions of the primers and probes used for nested PGR amplification and hybridization of (a) HIV-1 
gag and (b) HCV. 



The amplification was carried out as described 
above for HIV-1; 10^1 of the amplified products of 
both rounds {yielding 340 and 286 bp products, re- 
spectively) were subjected to electrophoresis in a 3% 
agarose gel and stained with ethidium bromide. The 
presence of a 286 bp band in the second amplifica- 
tion was considered a positive result 



was developed using a digoxigenin luminescent kit 
(Boerhinger). 

Statistical analysis 

Chi-square analyses were used to study the reparti- 
tion of qualitative parameters and Student's t-test for 
comparison of means. 



Hybridization 

For some samples, and in order to check the specifi- 
city of the amplified products, the DNAs were trans- 
ferred to a Hybond N^^ membrane (Amersham). 
Probes specific to HCV (5' TACACCCCAAnCCCAG- 
GACCACCGCGTCCTTTCTTGGAT 3') and to HlV-1 
gag (5' ATCCTGGGAHAAATAAAATAGTAAGAATG- 
TATAGCCCTAG 3') were labelled using a digoxigenin 
oligonucleotide tailing kit (Boerhinger). Hybridization 
was carried out overnight at 52^ and the reaction 



RESULTS 

As described above, the presence of 115 and 286 bp 
bands in the agarose gel after the second PGR round 
was considered positive for H!V-1 and HCV RNA, 
respectively (Fig. 2). The presence of a band after the 
first round (276 and 340 bp, respectively) was always 
followed by the appearance of the expected band at 
the second round. When Southern blot hybridization 
with probes (specific for HIV-1 and HCV, respectively) 
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(a) (b) 

Fig. 2. (a) Agarose gel electrophoresis of amplified PGR products from seven anti-HCV reactive sera samples 
and (b) Southern blot hybridization analysis of HCV amplified products. Upper part, first round of PGR; 
bottom part, second round of PGR. Lanes V and VI, molecular weight markers; lane T" and T"", PGR negative 
and positive controls. 



Table 1. HGV RNA and hepatitis 





HGV RNA 


Hepatitis 


Negative Positive 


Negative 


16 5 


Positive 


6 31 



P<5x 10"* 



Table 2. HGV RNA and GD^ lymphocytes (at entry) 



No. of patients 


HGV RNA 


GD: 


36 


Positive 


199 ±180 


22 


Negative 


144 ±190 



NS 



was carried out, the specificity of the observed bands 
could be confirmed in all cases {Fig. 2). 



HCV RNA and hepatitis 

Among 58 patients studied with the nested PGR, 36 
{62%) exhibited serum HGV RNA; 31 had a chronic 
hepatitis. The association between chronic hepatitis 
and circulating HCV RNA was highly significant 
(P<6-10"^) {Table 1). Only six patients with anti-HGV 
antibodies and chronic hepatitis failed to produce 
HCV-RNA in plasma; all had another cause of hepati- 
tis: three were treated for tuberculosis, two had 
systemic CMV infection and one patient received 
valproic acid. 



Table 3. HGV RNA and P2 microglobulin 



No. of patients 


HGV RNA 


p2 microglobulin 






(mgl-^) 


29 


Positive 


3998 ±1281 


22 


Negative 


341 6 ± 803 



NS 



HCV RNA, CD; lymphocytes and p2 
microglobulin 

Mean CD^ lymphocytes counts (Table 2) and P2 
microglobulin levels {Table 3) were not significantly 
different between patients with and without release 
of HGV-RNA within the serum (P = 0-28 and P = 0-12, 
respectively). 
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Table 4. HIV-1 RNA and CD"; lymphocytes 



No. of patients 


HCV RNA 


cd; ^i-^ 


33 


Positive 


99±133 


25 


Negative 


267 ±184 



P<002 



Table 5. HIV-1 RNA and P2 microglobulin 



No. of patients 



HIV-1 RNA 



P2 microglobulin 
(mgl-) 



33 
25 



Positive 
Negative 



4974 ±2793 
3272 ± 700 



P<4xio- 



Table 6. HIV-1 and HCV RNAs 



HiV-1 RNA 


HCV RNA 

Negative 


Positive 


Negative 
Positive 


8 

13 


17 
20 



HIV-1 RNA CD; lymphocytes and p2 
microglobulin 

Overall, HIV-1 RNA was detected in 33 out of 58 
{57%) patients. The CD^ lymphocyte counts were 
lower in patients with circulating HIV-1 RNA (P<1-10 ^) 
(Table 4). Higher p2 nnicroglobulin levels were 
observed in patients with positive HIV-1 RNA 
{P<4X10-^) (Table 5). 



HCV RNA and HIV-1 RNA 

Among the 58 investigated patients, eight had nega- 
tive signals while 20 (34%) had positive signals for 
both HiV-1 and HCV RNAs; 13 exhibited only, serum 
HIV-1 RNA and 17 only HCV RNA (Table 6). The 
presence of HCV RNA did not correlate with that of 
HlV-1 RNA. 



DISCUSSION 

The aim of the present study was to determine, in a 
series of HIV-1 and HCV antibody-positive patients, 
the percentages of those with active replication of 
HIV-1, HCV and both. To achieve this goal, we used a 
nested PCR for detection of HIV-1 and HCV RNAs. 
We compared the observed data to biological para- 
meters of hepatitis and immunodeficiency. 

Thirty-three patients out of 58 (57%) had existence 
of active replication of HIV-1, as noticed with our 
nested PCR. These results can be compared to those 
obtained by other groups ranging from 60% to 100% 
depending on clinical stages of the patients and CD^ 
levels.'-""" 

The correlation between the release of HIV-1 RNA 
and low numbers of CD", lymphocytes is confirma- 
tory since it is now well recognized that the synthesis 
of HIV-1, measured by plasma viraemia determi- 
nation and/or PCR RNA, increases with the decrease 
of CD; lymphocytes. Also confirmatory is the fact 
that low counts of CDt lymphocytes are associated 
with an increase of P2 microglobulin levels. 

Thirty-six out of 58 patients (62%) HIV-infected 
patients exhibited active replication of HCV. Consi- 
dering that synthesis of HCV RNA is associated with a 
persistent infection, 62% of HIV-1/HCV co-infected 
patients in our series are at risk for developing 
chronic hepatitis.^^ There was no correlation between 
the presence of HCV-RNA and the immune status 
(CD^ lymphocytes and p2 microglobulin) of the 
patients; therefore, the immunodeficiency does not 
appear to enhance HCV replication and HCV replica- 
tion does not appear to result in immune deteriora- 
tion. Clearly, the active synthesis of HCV is correlated 
with hepatitis. It is interesting to note that some 
patients (five in our series) release HCV within the 
plasma without exhibiting clinical signs of hepatitis, 
as has been observed by others.^^ In addition, 20 
patients out of 58 (34%) exhibited viral replication of 
both viruses simultaneously; the statistical analysis 
indicates, however, that there is no correlation 
between the presence of HIV-1 RNA and that of HCV 
RNA. Therefore, active replication of HIV-1 and HCV 
might not affect each other. It has been shown 
recently'''' that HCV can infect PBMC. These new 
data should stimulate in vitro studies of the interac- 
tions between HIV-1 and HCV in PBMC cultures, and 
in the peripheral blood cells of infected patients. 
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